{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Naive Bayes Classifier from scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of Contents\n",
    "\n",
    "- [Introduction](#Introduction)\n",
    "- [Conditional Probability and Bayes Theorem refresher](#cp-and-bt)\n",
    "- [The Mathematical model of Naive Bayes Algorithm](#mm)\n",
    "- [Naive Bayes with a simple example](#ex)\n",
    "- [Exercise](#exer)\n",
    "\n",
    "\n",
    "## Introduction\n",
    "\n",
    "Naive Bayes Algorithm is a probability based classification technique for classifying labelled data. It makes use of Bayes Theorem in order to predict the class of the given data. It is called Naive because it makes an assumption that the features of the dependent variable are mutually independent of each other. Although it is named naive, it is a very efficient model, often used as a baseline for text classification and recommender systems.\n",
    "\n",
    "## Conditional Probability and Bayes Theorem Refresher <a name=\"cp-and-bt\"></a>\n",
    "\n",
    "As already mentioned, naive bayes is a probabilistic model and depends on bayes theorem for prediction. Hence, understanding of conditional probability and bayes theorem is the key to understanding this algorithm.\n",
    "\n",
    "Conditional probability is defined as the probability of an event occuring given that another event has already occured. For example, suppose we rolled a dice and we know that the number that came out is an even number. Now if we want to find the probability of getting a 2 on the dice, it is expressed using conditional probability. Mathematically, conditional probability is defined as follows:-"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$ \\large \\large P(A|B) = \\frac{P(A \\bigcap B)}{P(B)} $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bayes theorem is a very elegant theroem based on conditional probability that describes the probability of an event based on prior knowledge of conditions that might be related to the event. It is named after Thomas Bayes. It is mathematically defined as follows:-\n",
    "\n",
    "$$ \\large P(A|B) = \\frac{P(B|A)P(A)}{P(B)} $$\n",
    "\n",
    "where A and B are two events.\n",
    "\n",
    "Each term in the above equation have been given a special name:-\n",
    "\n",
    "P(A|B) is known as Posterior Probability <br />\n",
    "P(B|A) is known as Likelihood Probability <br />\n",
    "P(A) is known as Prior Probability, and <br />\n",
    "P(B) is known as Evidence Probability <br />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Mathematical model of Naive Bayes Algorithm <a name=\"mm\"></a>\n",
    "\n",
    "Suppose we have a point $x$ as follows:-\n",
    "\n",
    "$$ \\large x=[x_1, x_2, x_3,....,x_n]$$\n",
    "\n",
    "Our task is to assign a class or a label to this point 'k'. If we have 'k' classes, then we have to find the probability of the point $x$ belonging to class $C_k$. The class with highest probability will be assigned as the label of $x$. The probablity of a class $C_k$ given $x$ can be calculated using Bayes Theorem as follows:-\n",
    "\n",
    "$$\\large P(C_k|x) = \\frac{P(x|C_k)P(C_k)}{P(x)} \\;\\;\\;\\;\\;\\;\\; - \\;\\;(i)$$\n",
    "\n",
    "So, to summarize, if our dataset has 3 classes (setosa, virginica and versicolor for example, then we have to calculate P(setosa|x), P(virginica|x) and P(versicolor|x) and the highest probability will be assigned as the label x.\n",
    "\n",
    "<a name='omitting'></a>Now, in our algorithm, we can omit the Evidence term, because is will remain constant for all the probabilities. This is done just to simplify the computations.\n",
    "\n",
    "Now, $P(C_k|x)$ can also be written as $P(C_k, x)$, and if we replace $x$ with its value, we get $P(C_k, x_1, x_2, x_3, ...., x_n)$. So, till now, we have basically transformed $P(C_k, x)$ into $$P(C_k, x_1, x_2, x_3, ...., x_n)\\;\\;\\;\\;\\;\\;\\; -\\;\\;\\;-(ii) $$. Things will start to get interesting now. \n",
    "\n",
    "In eq (ii), we can interchanging the terms inside the parenthesis won't change it's meaning. So, I am shifting the $C_k$ to the end. So, our equation will look like this - $P(x_1, x_2, x_3, ...., x_n, C_k)$. Now, if consider $x_1$ as event A and remaining terms as event B and apply bayes theorem, we will get:- \n",
    "\n",
    "$$\\large P(x_1, x_2, x_3, ...., x_n, C_k) = P(x_1|x_2, x_3, ...., x_n, C_k)P(x_2, x_3, ...., x_n, C_k)\\;\\;\\;\\;\\;\\;-(iii)\\;\\;\\;$$\n",
    "\n",
    "(Omitting the deniminator term as discussed [here](#omitting))\n",
    "\n",
    "If we keep applying bayes theorem in equation (iii), we will get:-\n",
    "\n",
    "$ \\quad \\quad P(C_k, x_1, x_2, x_3, ...., x_n) = P(C_k, x_1, x_2, x_3, ...., x_n)$ <br />\n",
    "$ \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad  = P(x_1|x_2, x_3, ...., x_n, C_k)P(x_2, x_3, ...., x_n, C_k)$ <br />\n",
    "$  \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad  = P(x_1|x_2, x_3, ...., x_n, C_k)P(x_2| x_3, ...., x_n, C_k)P(x_3, ...., x_n, C_k)$ <br />\n",
    "$  \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad  = \\;\\;...$ <br />\n",
    "$  \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad  = P(x_1|x_2, x_3, ...., x_n, C_k)P(x_2| x_3, ...., x_n, C_k)P(x_3, ...., x_n, C_k).....P(x_{n-1}|x_n, C_k)P(x_n|C_k)P(C_k) \\;\\;-(iv)$ <br />\n",
    "\n",
    "Now, in naive bayes, we assume that the features are conditionally independent of each other. If features are independent then:-\n",
    "\n",
    "$$ \\large P(x_i|x_{i+1}, ..., x_n, C_k) = P(x_i|C_k)$$\n",
    "\n",
    "If we apply this rule in equation (iv) we get:-\n",
    "\n",
    "$\\quad \\quad P(C_k, x_1, x_2, x_3, ...., x_n) = P(x_1|x_2, x_3, ...., x_n, C_k)P(x_2| x_3, ...., x_n, C_k)P(x_3, ...., x_n, C_k).....P(x_{n-1}|x_n, C_k)P(x_n|C_k)P(C_k)$\n",
    "$  \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad= P(C_k)P(x_1|C_k)P(x_2|C_k)P(x_3|C_k).....$ <br/>\n",
    "$  \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad \\quad= P(C_k)\\pi_{i=0}^{n}P(x_i|C_k)$\n",
    "\n",
    "Hence,\n",
    "\n",
    "$$ \\large P(C_k|x_n) = P(C_k)\\prod_{i=0}^{n}P(x_i|C_k) $$\n",
    "\n",
    "This is how we predict in Naive Bayes Algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Naive Bayes with a simple example<a name=\"ex\"></a>\n",
    "\n",
    "To understand the naive bayes with ta simple example, check out [this](http://shatterline.com/blog/2013/09/12/not-so-naive-classification-with-the-naive-bayes-classifier/) blog by [shatterline](http://shatterline.com/blog)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise<a name=\"exer\"></a>\n",
    "\n",
    "The objective of this excercise is to solidify your understanding of Naive Bayes algorithm by implementing it from scratch. You will be creating a class **NaiveBayes** and defining the methods in it that learn from data and predict. After implementing this class, you will run it on the same dataset shown in [this](#ex) example.\n",
    "\n",
    "You can refer the solution if you feel stuck somewhere. Also, one thing you need to be make sure is to use log of probabilities that you calculate. This is important because the probabilities you calculate will have very large decimal places but python will store only the 1st 16 places. This could lead to some discrepancies in the result so make sure to use log probabilities. If would also encourage you to comment your code as it is a good practice.\n",
    "\n",
    "Good luck"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing the libraries and loding the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(\n",
    "        { \n",
    "            'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'], \n",
    "            'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', \"Hot\", 'Mild'],\n",
    "            'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],\n",
    "            'Wind': ['Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Strong'],\n",
    "            'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']\n",
    "        }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = df.iloc[:, :-1]\n",
    "y = df.iloc[:, -1:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Start coding here..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class NaiveBayes:\n",
    "    def __init__(self, X, y):\n",
    "        '''\n",
    "            This method initializes all the required data fields of the NaiveBayes class\n",
    "            \n",
    "            Input -\n",
    "                X: A pandas dataframe consisting of all the dependent variables\n",
    "                y: A pandas dataframe consisting of labels\n",
    "        '''\n",
    "        # Initializing the Dependent and Independent Variables\n",
    "        self.X = X\n",
    "        self.y = y\n",
    "        \n",
    "        # Initializing the column name y. (came in handy for me. If you do not require it, then you can delete it)\n",
    "        self.y_label = y.columns[0]\n",
    "        \n",
    "        # Initializing the variables to store class priors. Initiallt set to None. The will the assigned the correct values by\n",
    "        # executing the calculate_prior method\n",
    "        # p_pos is probability of positive class\n",
    "        # p_neg is probability of negative class\n",
    "        self.p_pos = None\n",
    "        self.p_neg = None\n",
    "        \n",
    "        # A dictionary to store all likelihood probabilities\n",
    "        self.likelihoods = {}\n",
    "        \n",
    "        # Executing calculate_prior and calculate_likelihood to calculate prior and likelihood probabilities\n",
    "        self.calculate_prior()\n",
    "        self.calculate_likelihood()\n",
    "        \n",
    "    \n",
    "    def calculate_prior(self):\n",
    "        '''\n",
    "            Method for calculating the prior probabilities\n",
    "            \n",
    "            Input - None\n",
    "            \n",
    "            Expected output: Expected to assign p_pos and p_neg their correct log probability values. No need to return anything\n",
    "        '''\n",
    "        # write your code here ...\n",
    "    \n",
    "    def calculate_likelihood(self):\n",
    "        '''\n",
    "            Method for calculating the all the likelihood probabilities\n",
    "            \n",
    "            Input - None\n",
    "            \n",
    "            Expected output: Expected to create a dictionary of likelihood probabilities and assign it to likelihoods.\n",
    "        '''\n",
    "        # write your code here ...\n",
    "    \n",
    "    def predict(self, test_data):\n",
    "        '''\n",
    "            A method to predict the label for the input\n",
    "            \n",
    "            Input -\n",
    "                test_data: A dataframe of dependent variables\n",
    "                \n",
    "            Expected output: Expected to return a dataframe of predictions. The column name of dataframe should match column name of y\n",
    "        '''\n",
    "        # write your code here ...\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Test if your Code is working as expected..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the object\n",
    "nb = NaiveBayes(X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check if your code is predicting correctly\n",
    "assert nb.predict(X).equals(pd.DataFrame({'Play': ['No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No']})) == True, 'The prediction received is wrong. Kindly recheck your code. Refer the solution if you find yourself stuck somewhere'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
